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Abstract 

This  paper  describes  a  general  approach  for  optimized  live  heap  space  and  live  heap  space-bound 
analyses  for  garbage-collected  languages.  The  approach  is  based  on  program  analysis  and  transformations 
and  is  fully  automatic.  In  our  experience,  the  space-bound  analysis  generally  produces  accurate  (tight) 
upper  bounds  in  the  presence  of  partially  known  input  structures.  The  optimization  drastically  improve 
the  analysis  efficiency.  The  analyses  have  been  implemented  and  experimental  results  confirm  their 
accuracy  and  efficiency. 


1  Introduction 

Time  and  space  analysis  of  computer  programs  is  important  for  virtually  all  computer  applications,  especially 
in  embedded  systems,  real-time  systems,  and  interactive  systems.  In  particular,  space  analysis  is  becoming 
important  due  to  the  increasing  uses  of  high-level  languages  with  garbage  collection,  such  as  Java  [26],  the 
importance  of  cache  memories  in  performance  [32],  and  the  stringent  space  requirements  in  the  growing  area 
of  embedded  applications  [29] .  For  example,  space  analysis  can  determine  exact  memory  needs  of  embedded 
applications;  it  can  help  determine  patterns  of  space  usage  and  thus  help  analyze  cache  misses  or  page  faults; 
and  it  can  determine  memory  allocation  and  garbage  collection  behavior. 

Space  analysis  is  also  important  for  accurate  prediction  of  running  time  [12].  For  example,  analysis  of 
worst-case  execution  time  in  real-time  systems  often  uses  loop  bounds  or  recursion  depths  [25,  2]  both  of  which 
are  commonly  determined  by  the  size  of  the  data  being  processed.  Also,  memory  allocation  and  garbage 
collection,  as  well  as  cache  misses  and  page  faults,  contribute  directly  to  the  running  time.  This  is  increasingly 
significant  as  the  processor  speed  increases,  leaving  memory  access  as  the  performance  bottleneck. 

Much  work  on  space  analysis  has  been  done  in  algorithm  complexity  analysis  and  systems.  The  former 
is  in  terms  of  asymptotic  space  complexity  in  closed  forms  [19].  The  latter  is  mostly  in  the  form  of  tracing 
memory  behavior  or  analyzing  cache  effects  at  the  machine  level  [24,  32] .  What  has  been  lacking  is  analysis 
of  space  usage  for  high-level  languages,  in  particular,  automatic  and  accurate  techniques  for  live  heap  space 
analysis  for  languages  with  garbage  collection,  such  as  Java,  ML  or  Scheme. 

This  paper  describes  a  general  approach  for  automatic  accurate  analysis  of  live  heap  space  based  on 
program  analysis  and  transformations.  The  analysis  determines  the  maximum  size  of  the  live  data  on  the 
heap  during  execution.  This  is  the  minimum  amount  of  heap  space  needed  to  run  the  program  even  if 
garbage  collection  is  performed  whenever  garbage  is  created.  This  metric  is  useful  for  evaluating  other 
garbage  collection  schemes,  just  like  the  performance  of  an  optimal  cache  replacement  algorithm  is  useful  for 
evaluating  other  replacement  algorithms.  The  analysis  can  easily  be  modified  to  determine  related  metrics, 
such  as  space  usage  when  garbage  collection  is  performed  only  at  fixed  points  in  the  program.  It  can  also 
be  used  to  help  analyze  the  space  usage  of  some  continuously  running  processes  that  have  cyclic  behavior. 

Our  approach  starts  with  a  given  program  written  in  a  high-level  functional  language  with  garbage 
collection.  We  construct  (*)  a  space  function  that  takes  the  same  input  as  the  original  program  and  returns 
the  amount  of  space  used  and  (ii)  a  space-bound,  function  that  takes  as  input  a  characterization  of  a  set  of 
inputs  of  the  original  program  and  returns  an  upper  bound  on  the  space  used  by  the  original  program  on 
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partment,  SUNY  at  Stony  Brook,  Stony  Brook,  NY  11794-4400  USA.  Email:  {leena, stoller, liu}@cs. sunysb.edu.  Web: 
www.cs.sunysb.edu/~jleena, stoller, liu}.  Phone:  (631)632-1627.  Fax:  (631)632-8334. 


1 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

JAN  2003 


2.  REPORT  TYPE 


3.  DATES  COVERED 

00-00-2003  to  00-00-2003 


4.  TITLE  AND  SUBTITLE 

Optimized  Live  Heap  Bound  Analysis 


6.  AUTHOR(S) 


5a.  CONTRACT  NUMBER 


5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

State  University  of  New  York  at  Stony  Brook, Computer  Science 
Department, Stony  Brook, NY, 11794 


5f.  WORK  UNIT  NUMBER 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 


11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

Technical  Report  DAR-01-2,  February  2001  (revised  January  2003) 

14.  ABSTRACT 

see  report 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT 

unclassified 


b.  ABSTRACT 

unclassified 


c.  THIS  PAGE 

unclassified 


17.  LIMITATION  OF 

18.  NUMBER 

ABSTRACT 

OF  PAGES 

Same  as 

23 

Report  (SAR) 

19a.  NAME  OF 
RESPONSIBLE  PERSON 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


any  input  in  that  set.  A  key  problem  is  how  to  characterize  the  input  data  and  exploit  this  information  in 
the  analysis. 

In  traditional  complexity  analysis,  inputs  are  characterized  by  their  size.  Accommodating  this  requires 
manual  or  semi-automatic  transformation  of  the  time  or  space  function  [21,  33].  The  analysis  is  mainly 
asymptotic.  A  theoretically  challenging  problem  that  arises  in  this  approach  is  optimizing  the  time-bound 
or  space-bound  function  to  a  closed  form  in  terms  of  the  input  size  [21,  28,  8].  Although  much  progress  has 
been  made  in  this  area,  closed  forms  are  known  only  for  subclasses  of  functions.  Thus,  such  optimization 
can  not  be  automatically  done  for  analyzing  general  programs. 

Rosendahl  proposed  characterizing  inputs  using  partially  known  input  structures  [28].  For  example, 
instead  of  replacing  an  input  list  l  with  its  length  n,  we  simply  use  as  input  a  list  of  n  unknown  elements. 
A  special  value  uk  (“unknown”)  is  introduced  for  this  purpose.  It  represents  unknown  primitive  values;  if  it 
represented  constructed  data,  we  wouldn’t  know  how  much  space  it  used.  At  control  points  where  decisions 
depend  on  unknown  values,  the  maximum  space  usage  of  all  branches  is  computed.  Rosendahl  concentrated 
on  proving  the  correctness  of  this  transformation  for  time-bound  analysis.  He  relied  on  optimizations  to 
obtain  closed  forms,  but  closed  forms  can  not  be  obtained  for  all  bound  functions. 

Our  analysis  and  transformations  are  performed  at  source  level.  This  allows  implementations  to  be 
independent  of  compilers  and  underlying  systems  and  allows  analysis  results  to  be  understood  at  source  level. 
Our  space  bound  analysis  is  an  abstract  interpretation,  expressed  conveniently  as  a  program  transformation. 
Profiling,  like  space  functions,  measures  the  program’s  behavior  on  one  input  at  a  time;  space-bound  functions 
can  efficiently  analyze  the  program’s  behavior  on  a  set  of  inputs  at  once.  They  can  thus  be  used  to  determine 
worst-case  space  usage  of  a  program,  given  a  particular  metric  such  as  input  size.  Alternatively,  worst-case 
space  usage  may  be  determined  by  applying  the  space  function  to  a  worst-case  input.  But  in  general,  it  is 
non-trivial  to  determine  such  an  input.  Finding  space  bounds  is  undecidable,  so  space-bound  functions  may 
diverge.  In  our  experience,  this  is  rare. 

While  our  approach  can  be  applied  to  imperative  languages,  the  analysis  in  this  paper  enjoys  multiple 
benefits  due  to  the  functional  nature  of  the  source  language.  Associating  reference  counts  with  partially 
known  structures  forms  an  accurate  basis  for  determining  liveness.  With  reference  counting,  the  heap 
never  has  to  be  examined  in  its  entirety.  In  contrast,  garbage  collection  algorithms  for  imperative  languages 
generally  examine  the  whole  heap  at  once  which  is  costly.  Space-bound  functions  sometimes  have  to  evaluate 
both  branches  of  conditionals  due  to  unknown  values.  Our  analysis  does  not  need  to  maintain  multiple  copies 
of  the  heap  while  evaluating  the  two  branches  exactly  because  of  the  absence  of  imperative  update.  Copying 
the  heap  would  add  significant  complexity  and  overhead.  It  is  necessary  to  keep  the  results  of  both  branches. 
This  could  lead  to  loose  bounds  in  a  naive  analysis,  since  both  results  seem  live.  Our  analysis  handles  this 
by  examining  selected  parts  of  the  heap  at  a  limited  number  of  program  points.  If  the  source  language  were 
imperative,  the  analysis  would  need  to  examine  much  larger  parts  of  the  heap  and  do  so  more  frequently. 


2  Language 


We  use  a  first-order,  call- by- value  functional  language  that  has  literal  values  of  primitive  types  (e.g.,  Boolean 
and  integer  constants),  structured  data,  operations  on  primitive  types  (e.g.,  Boolean  and  arithmetic  oper¬ 
ations),  testers,  selectors,  conditionals,  bindings,  and  function  calls.  These  are  fundamental  program  con¬ 
structs  that  have  analogues  in  all  programming  languages.  A  program  is  a  set  of  mutually  recursive  function 
definitions  of  the  form  f(v\, ... ,vn )  =  e,  where  an  expression  e  is  given  by  the  grammar 


l 

c(ei , . .. ,  en) 

P(e  i,-,en) 
c?(e) 
c  *(e) 

if  ei  then  e2  else  e% 

let  v  =  ei  in  e2 
/(ei, ...,  en) 


variable  reference 
literal 

constructor  application 
operation  on  primitive  types 
tester  application 
selector  application 
conditional  expression 
binding  expression 
function  application 


2 


We  sometimes  use  infix  notation  for  primitive  operations. 

For  brevity,  we  assume  the  language  contains  only  two  kinds  of  primitive  operations  that  take  data 
constructions  as  arguments.  A  tester  application  c?(v)  returns  true  iff  v  has  outermost  constructor  c.  A 
selector  application  c~l(v )  returns  the  j’th  component  of  a  data  construction  v  with  outermost  constructor 
c.  Our  analysis  can  easily  be  extended  to  handle  other  similar  operations  such  as  equality  predicates. 

Input  programs  to  our  analysis  are  assumed  to  be  purely  functional,  but  transformed  programs  use 
arrays  and  imperative  update.  A  sequential  composition  ei;  e-i  returns  the  value  of  e 2.  In  examples,  we  use 
a  constructor  cons  with  arity  2. 


3  Live  Heap  Space  Function 

To  analyze  the  live  heap  space  used  by  a  program  on  a  known  input,  we  transform  the  program  into  one 
that  performs  all  the  computations  of  the  original  program  and  keeps  track  of  the  total  amount  of  live  data. 
Liveness  of  data  is  ascertained  using  reference  counts.  The  reference  count  for  a  data  construction  v  is  the 
number  of  pointers  to  v.  These  may  be  pointers  on  the  stack,  created  by  let  bindings  or  bindings  to  formal 
parameters  of  functions,  or  pointers  on  the  heap,  created  by  data  constructions.  Data  construction  v  is  live 
if  its  reference  count  is  greater  than  0  or  if  it  is  the  result  of  the  expression  just  evaluated.  Note  that  in  the 
latter  situation,  v  is  still  accessible  to  the  program  even  though  there  are  no  references  to  it. 

A  constructor  count  vector  v  has  one  element  u[ic]  corresponding  to  each  data  constructor  c  used  in  a 
given  program.  Let  P[ic\  be  the  size  of  an  instance  of  c.  Let  •  denote  dot  product  of  vectors.  The  maximum 
maxf't’i ,  V2)  of  constructor  count  vectors  v±  and  V2  is  V\  if  v\  ■  P  >  V2  ■  P  and  is  V2  otherwise. 

The  transformation  £  in  Figure  1  produces  live  heap  space  functions.  It  introduces  two  global  variables, 
live  and  maxlive,  that  satisfy:  (1)  for  each  constructor  c,  live[ic]  is  the  number  of  live  instances  of  c;  (2) 
maxlive  is  the  maximum  value  of  live  so  far  during  execution.  The  maximum  live  space  used  during  evaluation 
of  function  f  is  at  most  ml  ■  P  where  ml  is  the  value  of  maxlive  after  evaluation  of  the  space  or  bound  function 
for  /. 

Our  implementation  of  reference  counting  is  based  on  an  abstract  data  type  (ADT)  that  defines  five 
functions.  new(c(x  1, . . .  ,xn))  returns  a  value  v  representing  a  new  data  construction  c(a;i, . . .  ,xn),  whose 
reference  count  is  initialized  to  zero.  data(y)  returns  the  data  construction  c(a:i, . . .  ,xn).  rc(v)  returns  the 
reference  count  associated  with  v.  incrc{v)  and  decrc(v)  increment  and  decrement,  respectively,  the  reference 
count  associated  with  v.  incrc  and  decrc  are  no-ops  if  the  argument  is  a  primitive  value. 

Updating  Reference  Counts.  rc(v),  for  a  data  construction  v,  is  incremented  when  v  is  bound  to  a 
variable  or  function  parameter,  or  a  data  construction  containing  rasa  child  is  created.  rc(v)  is  decremented 
when  the  scope  of  a  let  binding  for  v  ends,  a  function  call  with  an  argument  bound  to  v  returns,  or  a  data 
construction  containing  v  as  a  child  becomes  garbage. 

Updating  live  and  maxlive.  Whenever  new  data  is  constructed,  live  is  incremented,  and  maxlive  is 
recomputed.  An  auxiliary  function  gc  (“garbage  collect”)  is  called  whenever  data  can  become  garbage.  For 
a  data  construction  v,  gc{v )  decrements  rc{v)  and  then,  if  rc(v)  is  not  positive,  it  decrements  the  appropriate 
element  of  live  and  calls  gc  recursively  on  the  children  of  v.  A  data  construction  may  become  garbage  (1) 
because  of  a  decrement  of  its  reference  count  or  (2)  because  it  is  created  in  the  argument  of  a  selector  or 
tester  and  is  lost  to  the  program  after  the  result  of  the  selection  or  test  is  obtained.  For  example,  cons( 0, 1) 
is  garbage  after  the  application  of  cons~2  in  cons(cons~2 (cons( 0,1)), 2);  note  that  its  reference  count  is 
always  0. 

gcExcept(u,v)  is  called  when  u  should  be  garbage  collected,  v  should  not  be  garbage  collected  and  v 
might  be  a  descendant  of  u.  At  the  end  of  function  calls  and  let  expressions,  values  bound  to  parameters 
and  variables  should  be  garbage  collected  without  garbage  collecting  the  result  of  the  function  call  or  the 
let  expression.  Similarly,  after  selector  applications,  data  selected  from  should  be  garbage  collected  without 
garbage  collecting  the  selected  part,  even  if  the  reference  count  of  that  part  becomes  0.  Figure  5  contains 
an  example  of  a  live  heap  space  function. 
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/l(v i, . . . ,  vn )  =  £  [e]  where  e  is  the  body  of  function  /,  i.e. ,  f(y i, . . . ,  vn)  =  e 
C  [v]  =  v 

m=i 

£  [c(el, . . . ,  en)]  =  live[ic\++;  if  (P  •  live  >  P  ■  maxlive) 

then  for  c  €  Constructors  maxlive[ic }  :=  live[ic\- 
let  r1  =  £  [ei]  ,...,rn=£  [e„]  in 
incrc(ri); . . . ;  incrc(rn)\  new(c(ri, . . .  ,  r„)) 

£  [p(ei, . . . ,  en)]  =  p(£  [ej  ,...,£  [e,J) 

£  [c?(e)]  =  let  x  =  C  [e]  in 

let  r  =  c?(data(x ))  in 

(if  not  (is  Prim  (x))  and  rc(x)  =  0  then  gc(x));  r 
C  [c_l(e)]  =  let  x  =  C  [e]  in 

let  r  =  c~l(data(x))  in 

(if  not(isPrim( x))  and  rc(x )  =  0  then  gcExcept(x,r))-,  r 

C  [if  el  then  e2  else  e3]  =  if  C  [el]  then  C  [e2]  else  C  [e3] 

C  [let  v  =  ei  in  e2]  =  let  v  =  £  [ei]  in 
incrc(v)\ 
let  r  =  £  [e2]  in 
gcExcept(v,r );  r 

£  \f(e  i, . . . ,  en)\  =  let  rx  =  £  [ej  ,...,rn  =  £  [e„]  in 
incrc(ri); . . . ;  incrc(rn ); 
let  r  =  /l(j*i,  . . .  ,r„)  in 
gcExcept(ri,r)', . . . ;  gcExcept(rn,  r);  r 

gc(i>)  =  if  not(isPrim(v )) 
then  decrc(v)\ 

if  rc(u)  <  0 

then  /ine[conTj/pe(n)] - ; 

for  <  =  1.  .arity(v)  gc(c  l(data(v))) 
gcExcept(u,v)  =  incrc(v);  gc(u)\  decrc(v) 


Figure  1:  Transformation  that  produces  live  heap  space  functions  f isPrim(v)  returns  true  iff  v  is  primitive. 
conType(v )  returns  an  integer  ic  that  uniquely  identifies  the  outermost  constructor  c  in  data(v).  arity(v) 
returns  the  arity  of  the  outermost  constructor  in  data(v). 


4  Live  Heap  Space  Bound  Function 

The  transformation  £&  in  Figures  2-3  produces  live  heap  space-bound  functions.  We  sometimes  refer  to 
space-bound  functions  simply  as  bound  functions.  At  every  point  during  the  execution  of  £b[f](x),  the 
value  of  live  is  an  upper  bound  on  the  possible  values  of  live  at  the  corresponding  point  in  executions  of 
£[/](  x'),  for  all  x'  in  the  set  represented  by  x.  As  before,  maxlive  contains  the  maximum  value  of  live  so 
far  during  execution.  The  presence  of  partially  known  inputs  in  bounds  analysis  causes  uncertainty.  For 
conditional  expressions  whose  tests  evaluate  to  uk,  both  branches  are  evaluated  to  determine  the  maximum 
live  heap  space  usage. 

Correctness  of  live  heap  bound  analysis  depends  on  keeping  track  of  all  references  and  reference  counts 
meticulously.  Summarizing  the  results  of  two  branches  into  a  single  partially  known  structure  that  represents 
both  results,  as  is  done  in  timing  analysis  [22]  and  stack  space  and  heap  allocation  analysis  [31],  does  not 
work  for  live  heap  analysis  because  it  would  be  impossible  to  keep  track  of  reference  counts  accurately.  So 
the  result  of  a  conditional  whose  test  evaluates  to  uk  is  a  separate  entity,  a  join-value,  that  points  to  both 
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possible  results  and  has  its  own  reference  count.  By  keeping  both  results  live,  we  run  the  risk  of  obtaining 
loose  bounds,  since  live  might  include  the  sizes  of  both  results  when  only  one  of  them  is  live.  To  keep  live 
as  tight  as  possible,  we  examine  join-values  at  appropriate  points  of  execution  and  manipulate  live  so  that 
it  includes  the  size  of  only  a  single  largest  data  structure  pointed  to  by  each  join-value. 

4.1  Abstract  Data  Types 

Two  abstract  data  types  (ADTs),  the  join- value  type  and  the  con- value  ( “constructed- value” )  type,  are 
used.  A  join- value  represents  a  set  of  possible  results.  Join- values  are  created  by  conditional  expressions 
whose  tests  evaluate  to  uk  and  by  selectors  applied  to  join-values.  Each  join-value  j  has  a  list  branches(j) 
containing  references  to  con- values  and/or  join- values.  Primitive  values,  if  any,  in  the  set  represented  by  j  are 
not  stored  in  j.  Thus,  if  branches  (j)  has  only  one  element,  j  represents  a  choice  between  that  element  and 
some  primitive  value,  j  has  an  associated  constructor  count  vector  exs(j).  Parts  of  the  data  constructions 
represented  by  j  may  be  live  regardless  of  j.  Of  the  other  parts,  only  those  occurring  in  a  single  largest 
branch  are  live  in  a  worst  case  (i.e.,  maximal  live  heap  space)  execution  of  the  original  program.  The  sum  of 
the  other  parts  that  are  not  in  the  largest  branch  is  an  excess  and  is  stored  in  exs(j),  as  discussed  in  Section 
4.3.  live  does  not  include  exs(j).  When  j  becomes  garbage,  exs(j)  is  added  to  live  just  before  garbage 
collecting  the  branches  of  j. 

The  con-value  type  is  an  extension  of  the  ADT  described  in  Section  3.  con-values  and  join-values  have 
a  reference  count  and  a  list  of  join-parents.  A  join-value  j  is  a  join-parent  of  v  if  branches(j)  references  v. 
Functions  re,  incrc ,  decrc,  joinParents,  addJoinParent ,  and  delJoinParent  apply  to  both  ADTs;  the  names 
indicate  their  meanings.  newjoin(b)  creates  a  join-value  j  with  a  list  b  of  branches,  and  with  rc(j')  initialized 
to  0,  joinParents(j)  initialized  to  nil ,  and  exs(j)  initialized  to  the  zero  vector,  denoted  Vo- 

4.2  Conditionals,  Selectors  and  Testers 

Consider  a  conditional  expression  (if  ei  then  e 2  else  63)^  whose  condition  evaluates  to  uk.  Suppose  l\,  I2 
and  I3  are  the  values  of  live  after  evaluating  ei,  ei;e2  and  ei;e3,  respectively.  The  value  of  live  at  f  is  set 
to  max(/2,  hi)-  The  result  r  of  the  conditional  expression  is  computed  by  join(r2,r3),  where  7*2  and  7-3  are 
the  results  of  e2  and  e3,  respectively.  If  r2  and  r3  are  primitive,  then  r  is  r2  if  V2  =  r 3  and  uk  otherwise.  If 
T2  and  7*3  are  not  primitive  and  are  the  same,  then  r  is  ?’2-  Otherwise,  r  is  a  join-value,  and  exs(r)  is  set  to 
min(Z2  —  h,  I3  —  h).  h  ~  h  and  I3  —  h  are  the  amounts  of  new  data  in  r2  and  r3,  i.e.,  the  amounts  of  data 
created  by  e2  and  es-  T2  and  ?’3  may  contain  old  data  too,  i.e.,  data  created  before  evaluating  e2  and  63.  Old 
data  are  live  regardless  of  r.  Between  the  sets  of  new  data  in  r2  and  r3,  only  one  set  is  live.  We  keep  the 
larger  set  live;  the  size  of  the  other  set  is  exs(r). 

Observe  that  in  the  transformation  of  conditional  (if  ei  then  e2  else  63),  we  evaluate  e2  and  then  e3, 
making  copies  of  only  live  in  between.  We  do  not  have  to  copy  the  entire  heap  before  or  after  evaluating 
either  expression  because  the  source  language  does  not  contain  imperative  update.  Thus,  if  hi  is  the  heap 
after  the  evaluation  of  ei,  then  e2  and  e3  modify  hi  only  by  adding  new  con-values  to  it.  Informally,  /12, 
the  heap  after  evaluation  of  e2,  is  just  (hi  +  7^2 ),  where  r2  is  the  result  of  e2-  Similarly,  ft,  3  is  (hi  +  t^)) 
/13  and  7*3  having  the  expected  meanings.  In  other  words,  the  heap  after  evaluation  of  the  conditional  is 
(hi  +  (r2  or  7*3)) .  The  choice  between  7^  and  7-3  is  conveniently  represented  using  a  join-value  that  points  to 
them  both. 

Selectors  and  testers  return  uk  if  given  uk  arguments.  For  join- values  with  two  non-primitive  branches, 
the  selector  or  tester  is  first  applied  to  the  branches  and  the  join  of  the  results  is  returned.  The  exs  field  of 
a  join-value  j  that  is  the  result  of  applying  a  selector  to  another  join-value  j'  is  set  to  Vo,  because  when  j  is 
created,  j'  is  live,  and  exs(j')  already  takes  care  of  any  excess.  In  fact,  computing  exs(j)  would  yield  Vo  as 
the  result,  because  j'  references  j’s  children  and  as  a  result,  none  of  the  descendants  of  j  are  contained-in  j. 

Applying  any  tester  other  than  nulll  to  join- values  with  one  primitive  branch  and  one  non-primitive 
branch,  results  in  the  join  of  (a)  false  (the  result  of  applying  any  tester  other  than  null7  to  any  primitive 
branch)  and  (b)  the  result  of  applying  the  tester  to  the  non-primitive  branch.  Recall  that  join-values  do  not 
save  the  values  of  their  primitive  branches.  Applying  null ?  to  an  unknown  primitive  branch  yields  uk,  since 
the  set  of  primitive  values  contains  both  a  null  value  nil  and  non-null  elements.  So,  the  result  of  null ?  on 
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fLb(vi,  ■  ■  ■ ,  vn)  =  Cb  [e]  where  e  is  the  body  of  function  /,  i.e. ,  f(v  1, . . . ,  vn)  =  e 

cb  H  =  v 
Cb  [l]  =  l 

Cb  [c(ei, . . . ,  en)\  =  live[ic\++;  if  ( P  ■  live  >  P  ■  maxlive) 

then  for  c  €  Constructors  maxlive[ic\  :=  live[ic]; 
let  n  =  Cb  [ei] , . . . ,  rn  =  Cb  [ en ]  in 
incrc(ri); . . . ;  mcrc(r„);  new(c(r\, . . .  ,rn )) 

Cb  [p(e  1, . . . ,  en)]  =  pu(Cb  [el] , . . . ,  Cb  [en]) 

pu(v i, . . . ,  vn)  =  if  Vi  =  uk  or  •  •  •  or  vn  =  uk  then  uk  else  p(v i, . . . ,  vn) 

Cb  [c?(e)]  =  let  x  =  Cb  [e]  in 

let  r  =  c?u(x)  in 

(if  not(isPnm(x))  and  rc{ x)  =  0  then  gc(x));  r 


c?u(v)  =  if  v  =  uk  then  uk 

else  if  is  Join  (v)  then  if  length(branches{v))  =  1 

then  join(false,  cl u(first (branches (u)))) 

else  joined u(first {branches (n))) ,  cl u(second(branches (v)))) 

else  cl(data(v)) 

Cb  [^(e)]  =  let  x  =  Cb  [e]  in 
let  r  =  c~l{ x)  in 

(if  not(isPrim( x))  and  rc(x)  =  0  then  gcExcept(x,r)\  recomputeExs(r))\  r 
c”'(d)  =  if  v  =  uk  then  uk 
else  if  isJoin(v) 

then  if  length(branches(v ))  =  1  then  c~l(false) 

else  join(c~l (first(branches(v))) ,  c~l (second(branches(v)))) 
else  c~l(data(v)) 


Cb  [if  e\  then  e2  else  e3]  = 
let  b  =  Cb  [ei]  in 

if  b  =  uk  then  let  l\  =  copy(live)  in 
let  r2  =  Cb  [e2]  in 
let  1 2  =  copy  (live)  in 
live  :=  1 1;  let  r3  =  Cb  [e3]  in 

let  l3  =  copy(live)  in 
live  :=  ma x(Z2,Z3);  let  r  =  join(r2,r3)  in 

setexs(r,  min(Z2  —  h,h  —  h))',  r 

else  if  b  then  Cb  [e2]  else  Cb  [e3] 

Cb  [let  v  =  ei  in  e2]  =  let  v  =  Cb  [ei]  in 

incrc(v)\  let  r  =  Cb  [e2]  in  (gcExcept(v,  r);  recomputeExs (r );  r) 

Cb  [y*(e i  ? .  ■  • ,  en)]  —  let  ri  —  Cb  [ei] ,  •  ■  • ,  rn  —  Cb  \pn\  m 
mcrc(ri); . . . ;  incrc(rn ); 
let  r  =  fLb(ri, . . .  ,rn)  in 

g  c  Except  (ri,  r) ; . . . ;  gcExcept(rn,  r);  recomputeExs  (r);  r 


Figure  2:  Transformation  that  produces  live  heap  space-bound  functions  frj}.  copy  copies  a  vector.  +  and 
— ,  when  applied  to  vectors,  denote  component-wise  sum  and  difference. 
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join-values  with  a  primitive  branch  is  uk,  irrespective  of  whether  the  result  of  null ?  on  the  non-primitive 
branch  is  true  or  false. 

When  a  selector  c~l  is  applied  to  a  join- value  j  with  a  primitive  branch,  it  simply  aborts  by  attempting  to 
apply  the  selector  to  an  arbitrary  primitive  value.  However,  if  we  assume  that  the  given  program  never  applies 
selectors  to  primitive  values,  then  the  occurrence  of  c~l(j)  in  the  analysis  corresponds  to  the  application 
of  c~l  to  the  non-primitive  branch  of  j  in  the  original  program.  Some  test  in  a  conditional  in  the  original 
program  must  prevent  c~l  from  being  applied  to  the  primitive  branch.  Thus,  with  this  assumption,  we  could 
use  the  following  alternative  definition  of  c~l. 

c-»  =  if  v  =  uk  then  uk 
else  if  is  Join  (v) 

then  if  length  (branches  (v ))  =  1  then  cfl  (first(branches(v))) 

else  join(cfl (first ( branches (v) ) ) ,  cf'1  (second ( branches (v) ) ) ) 
else  c~l(data(v)) 

Applications  of  selectors  to  join-values  with  primitive  branches  is  in  fact  seen  in  only  one  of  our  examples, 
namely  quicksort.  For  all  of  our  examples,  it  is  easy  to  see  that  for  the  expected  input,  selectors  are  never 
applied  to  primitive  values.  Hence,  it  is  safe  to  use  the  above  definition  of  for  all  these  examples.  In 
general,  a  simple  static  analysis  could  be  used  to  automatically  ascertain  such  program  properties. 

4.3  Achieving  Tightness  of  live  in  the  Presence  of  Join-values 

The  following  example  illustrates  why  live  may  not  be  as  tight  as  desired. 

let  u  =  cons(  1,  nil )  in 

let  v  =  cons( 2,  nil)  in  (1) 

(if  uk  then  cons(3,v)  else  cons (4,  cons (5,  it))) 

Let  r  be  the  result  of  the  conditional.  Let  Cj  denote  the  data  construction  with  cons-1  (cf)  =  i.  Just  after 
the  conditional  is  evaluated,  live  includes  the  sizes  of  both  Ci  and  C2-  live  excludes  the  size  of  C3  because 
the  result  of  the  alternative  branch  containing  C4  and  C5  is  larger;  so  live  includes  the  latter  instead  of  the 
former.  Once  v  goes  out  of  scope,  C2  is  live  only  through  the  reference  from  r.  At  this  point  in  any  execution 
of  the  original  program,  either  C2  and  C3  are  live  or  C4  and  C5  are  live;  ci  is  definitely  live  because  of  the 
binding  for  u.  But  in  the  analysis,  because  of  the  reference  from  r,  C2  is  kept  live  and  its  size  is  included  in 
live.  Thus,  join-value  r  causes  live  to  be  loose  by  one  cons. 

In  general,  at  any  point  at  which  all  references  to  a  data  construction  v  are  lost  except  for  references  from 
a  join-value  j,  there  is  a  possibility  that  live  is  loose  because  it  includes  the  size  of  v  when  it  should  not. 
These  points  arise  immediately  after  decrements  to  rc(v)  caused  by  (1)  a  variable  or  parameter  going  out 
of  scope  or  (2)  parts  of  data  becoming  garbage  after  the  application  of  a  selector,  v  may  then  be  an  excess 
in  live  caused  by  a  join- value  j  which  in  case  (1),  is  in  the  result  of  the  function  call  or  the  let  expression 
and  in  case  (2),  is  in  the  result  of  the  selector.  recomputeExs ,  defined  in  Figure  3,  is  called  on  the  results  of 
function  calls,  let  expressions  and  selectors  to  compute  the  exs  attributes  of  join-values  in  the  results  and 
adjust  the  value  of  live  appropriately. 

Observe  that  v  may  be  a  part  of  a  join- value  j'  that  is  not  in  the  result  of  the  function  call  or  let  expression 
or  selector  application.  It  can  be  shown  that  loss  of  references  to  v  at  the  completion  of  the  function  call, 
let  expression  or  selector  application,  has  no  effect  on  exs(j')  and  so  we  do  not  call  recomputeExs  (j').  This 
applies  to  tester  applications  also.  Further,  results  of  testers  are  boolean  and  hence  may  not  contain  any 
join-values.  So,  recomputeExs  is  not  called  after  tester  applications.  Note  that  recomputeExs  is  used  only 
to  obtain  tighter  bounds,  so  calling  or  not  calling  it  at  any  point  in  the  analysis  is  safe,  i.e.,  still  yields  an 
upper  bound  on  the  space  usage. 

We  now  formally  define  the  exs  attribute  of  join- values.  Consider  the  stack  and  live  heap  as  a  graph:  con- 
values  and  join-values  in  the  heap  and  formal  parameters  of  functions  and  let-bound  variables  on  the  stack 
are  vertices;  references  from  variables,  con-values  and  join-values  to  con-values  and  join-values,  including 
references  in  branches  attributes  but  excluding  references  in  joinParents  attributes,  are  edges.  We  say  that 
u  is  contained-in  v  if  v  is  an  ancestor  of  u  in  every  path  from  a  node  for  a  parameter  or  variable  to  u.  For  a 
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join(v  1,^2)  =  if  eql{v  1,^2)  then  i>i 
else  if  isPrim{v  1) 

then  if  isPrim(v 2)  then  uk 

else  let  result  =  newjoin([v 2])  in 

incrc(v 2);  addJoinParent(v 2,  result );  result 
else  if  is  Prim  (v  2) 

then  let  result  =  newjoin([v  1])  in 

incrc(v  1);  addJoinParent(v\, result);  result 
else  let  result  =  newjoin([v  1,^2])  in 

incrc(v  1);  addJoinParent(v  1,  result ); 
incrc{v 2);  addJoinParent(v 2,  result); 
result 

gc{v)  =  if  not(isPrim(v)) 
then  decrc{v); 

if  rc(u)  <  0 
then  if  isJoin(v ) 

then  Zii/e  =  live  +  exs(v); 
for  u  in  branches  (v) 
delJoinParentiu ,  u);  gc(u) 

else  iwe  [con  Type  (u)] - ; 

for  i  =  1.  .arity(v)  gc{c  t(data(v ))) 
recomputeExs{v)  =  if  not(isPrim(v))  then 
if  is  Join  (v) 

then  if  length(branches(v))  =  2  and  containedlno(branches(v)) 
then  let  newexs  =  computeExs (v)  in 
if  newexs  >  exs(v) 

then  fe;e  =  Zzne  +  exs(v)  —  newexs;  setexs(v,  newexs) 
else  for  a  in  branches  (v)  recomputeExs(u) 
else  for  i  =  1  ..arityly)  recomputeExs (c-* (data(v))) 
containedlno(ls)  =  if  null(ls)  then  true 

else  if  rc^ons^1)^))  =  length(joinParents(cons _1(Zs)))  =  1 
then  containedlno(cons~2 (Is)) 
else  false 


Figure  3:  Auxiliary  functions  join,  gc,  recomputeExs  and  containedln o-  For  brevity,  we  leave  out  the 
definition  of  computeExs  which  is  based  on  (2)  in  Section  4.3. 


join- value  j,  let  Cj  denote  the  set  of  all  con-values  and  join- values  contained-in  j,  and  let  Gj  denote  the  graph 
comprising  vertices  and  edges  reachable  from  j.  A  join-path,  of  j  is  a  connected  subgraph  of  Gj  containing 
j  and  constructed  from  Gj  by  selecting  at  every  join-value  j'  reachable  from  j,  exactly  one  branch  of  j' 
and  then,  after  all  selections  have  been  made,  eliminating  unreachable  vertices  and  edges.  Figure  4  contains 
examples  of  join-paths.  Join-paths  of  j  correspond  to  data  structures  represented  by  j.  conCountVec(u) 
for  a  con-value  u  is  a  constructor  count  vector  in  which  the  count  of  the  constructor  type  of  u  is  1  and  all 
other  counts  are  0.  maxJoinPath(j)  is  the  maximum  of  the  sizes  of  all  join-paths  of  j,  where  size(P)  for  a 
join-path  P  of  j  is  defined  as 

size(P)  =  conCount.Vec{u) 

u  is  a  con- value  in  P  n  Cj 

exs(j)  is  then  defined  as  follows  if  both  branches  of  j  are  non-primitive  and  contained-in  j  (otherwise,  exs(j) 


Figure  4:  Examples  of  join-paths.  Gl,  G2  and  G3  are  join-paths  of  join-value  jl.  Circles  denote  join-values 
and  rectangles  denote  con- values. 

is  Vq). 


exs{j) 

=  total(j)  —  sub(j)  —  maxJoinPath(j) 

(2) 

total  {j) 

=  conCountVec(u) 

(3) 

u  is  a  con-value  in  Cj 

sub(j) 

=  exs  ( u ) 

(4) 

u  is  a  join- value  in  Cj 

The  sums  and  differences  of  constructor  count  vectors  are  computed  component-wise.  (2)  does  not  result 
in  vectors  with  negative  counts,  since  sub(j)  and  maxJoinPath.Q)  count  data  in  disjoint  subsets  of  Cj.  This 
is  justified  as  follows  :  if  the  join-path  P  which  contributes  to  maxJoinPath( j)  contains  a  join-value  j',  then 
P  contains  a  largest  join-path  of  /  and  exs(j')  counts  data  in  the  other  join-paths  of  j'.  Hence,  exs(j'), 
for  any  descendant  join-value  j'  of  j,  counts  data  in  join-paths  that  are  not  part  of  P.  Informally,  (2)  says 
“subtract  from  live  everything  except  a  largest  join-path  and  nodes  that  have  already  been  subtracted  from 
livev .  In  our  implementation,  exs(j)  is  computed  using  (2)  if  the  elements  in  branches(j)  have  reference 
counts  equal  to  1;  exs(j)  is  conservatively  set  to  Vo  otherwise. 

A  recomputed  value  V2  of  exs{j)  can  be  less  than  the  existing  value  V\  of  exs(j).  This  happens  only  if 
after  the  computation  of  V\,  selectors  are  applied  to  j  creating  a  new  join- value  j'  that  references  parts  of  j 
and  a  subsequent  garbage  collection  leads  to  the  recomputation  of  exs(j)  yielding  v2 •  The  references  from  j' 
to  parts  of  j  cause  these  parts  to  not  be  contained-in  j  and  so  (2)  produces  a  smaller  value  than  the  existing 
exs(j).  But  selection  from  j  does  not  alter  the  fact  that  only  one  of  the  data  constructions  represented  by  j 
is  live,  so  the  new  smaller  value  of  exs(j)  is  artificial  and  is  ignored.  Figure  5  contains  an  example  of  a  live 
heap  space-bound  function. 


5  Optimizations 

We  use  two  optimizations  that  reduce  the  asymptotic  complexity  of  live  heap  analysis  for  many  programs. 
The  first  optimization  avoids  calls  to  recomputeExs  on  data  without  join-value  descendants.  This  is  done  by 
adding  to  con- values  and  join- values  a  boolean  attribute  that  indicates  the  presence  of  join- value  descendants. 

The  second  optimization  reduces  some  join-values  to  con-values,  thus  avoiding  expensive  manipulations 
of  the  former.  At  any  point  p  during  the  execution  of  a  space-bound  function,  a  join- value  j  with  branches  b\ 
and  62  and  without  any  join-value  descendants  may  be  reduced  to  b\  if  b\  leads  to  equal  or  greater  live  heap 
usage  as  compared  to  b2.  The  following  discussion  also  holds  with  bi  and  b2  interchanged.  Our  optimization 
is  conservative  and  reduces  j  to  &i  if  b\  and  b2  have  the  same  shape,  contain  equal  primitive  values  at 
corresponding  locations,  and  for  every  descendant  d\  of  foi  that  is  not  contained-in  j,  the  corresponding 
descendant  d2  of  b2  is  the  same  as  d\.  If  these  conditions  are  true  and  additionally,  if  eq?  is  not  used  on  j  or 
on  data  selected  from  j,  then  at  point  p  and  thereafter,  b  1  contributes  as  much  or  more  to  live  as  compared 
to  b2 .  In  the  following  subsections,  we  formalize  the  notion  of  reducibility  of  join- values  and  prove  that 
reduction  of  join- values  is  safe,  i.e.,  the  optimized  space-bound  analysis  provides  upper  bounds  on  live  heap 
usage. 
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reverse(ls,  revls)  =  if  null  (Is)  then  revls 

else  rever se(cons~ 2 (Is) ,  cons(cons~1  (Is),  revls)) 

reverse^ls,  revls)  =  if  (  let  l  =  Is  in 

let  lnum  =  null(data(l))  in 
if  not(isPrim(l))  and  rc(l)  =  0  then  gc(l); 

Inull ?) 

then  revls 

else  let  arg  1  =  (  let  l  =  Is  in 

let  lc(ir  =  cons~2  (data(l))  in 
if  not  (is  Prim  (l))  and  rc(l)  =  0 
then  gcExcept(l,lcdr)', 
ledr)* 

arg2  =  (  live[icons}++',  maxlive  :=  ma x(live,  maxlive ); 

let  lscar  =  sub-expression  *  with  cons~2  replaced  by  cons 
revl  —  revls  in 

incrc(lscar );  incrc(revl );  new(cons(lscar,  revl)))  in 
incrc(argl)\  incrc(arg2)-, 
let  r  =  reverseL(argl,arg2)  in 
gcExcept(arg  1,  r);  gcExcept(arg2,  r);  r 


reverseLb(ls,  revls)  =  let  lsnumi  =  (  let  l  =  Is  in 

let  lnuU7  =  nullu(l)  in 
if  not(isPrim(l))  and  rc(l)  =  0  then  gc(l); 

Inull?)  1*1 

if  lsnuii9  uk 

then  let  1 1  =  copy(live)  in 

let  branchi  =  revls  in 
let  l2  =  copy  (live)  in 
live  :=  li; 
let  branch^  = 

(let  arg  1  =  (  let  l  =  Is  in 

let  lcdr  =  cons~2 u(l)  in 
if  not(isPrim(l))  and  rc(l)  =  0 
then  incrc(lcdr)\  gc(l)\  decrc(lcdr)', 

lcdr)* 

arg2  =  (  live[icons\++;  maxlive  :=  ma x(live,  maxlive)', 
let  lscar  =  sub-expression  *  with 

cons~2u  replaced  by  cons^1  u 
revl  =  revls  in 

incrc(lscar)',  incrc(revl);  new  (cons  (l scar,  revl))) 
incrc(arg  1);  incrc(arg2)\ 
let  r  =  reverseLb(argl,arg2)  in 
gcExcept(argl,r)',  gcExcept(arg2,r)\  recompute Exs(r);  rp  in 
let  I3  =  copy(live)  in 
live  :=  max(?2,  Z3); 
let  r  =  join(branch\,branch2)  in 
setexs(r,mm(l2  —  li,h  —  h))\  r 

else  if  lsnun2 

then  revls 

else  sub-expression  f 


Figure  5:  Examples  of  space  and  space-bound  functions.  reverseL  and  reverseLb  are  the  space  and  space- 
bound  functions,  respectively,  of  reverse. 
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An  edge-ordered  rooted  directed  acyclic  graph  (DAG)  is  a  tuple  (V,  E,  r)  where  V  is  a  set  of  vertices,  E 
is  a  set  of  edges  and  r  is  the  root  of  the  DAG,  i.e,  a  node  from  which  all  other  nodes  in  the  DAG  are  reached. 
An  edge  in  E  is  a  tuple  (x,y,i),  where  x  is  the  source  node,  y  is  the  destination  node,  and  i  is  the  index  of 
the  edge  amongst  the  out-edges  of  x. 

Recall  that  the  stack  and  live  heap  may  be  viewed  as  a  graph.  The  subgraph  Gx  =  (Vx,  Ex,  x)  comprised 
of  nodes  and  edges  reachable  from  a  node  x  is  an  edge-ordered  rooted  DAG.  It  is  acyclic  because  we  are 
dealing  with  a  first-order  functional  language  without  imperative  update.  The  ordering  of  fields  in  data 
constructions  imposes  an  ordering  on  the  out-edges  from  nodes.  For  example,  if  x  is  an  instance  of  cons  and 
cons~2(x)  =  y  and  y  is  not  a  primitive  value,  then  Gx  contains  the  edge  (x,y,  2). 

Two  edge-ordered  rooted  DAGs  G\  =  (Vi,E\,r\)  and  G2  =  {V2,E2,r2)  are  isomorphic  if  there  exists  a 
one-to-one,  onto  mapping  /  :  V\  — >  V2  such  that  ( x,y,i )  £  E\  iff  (f(x),f(y),i)  £  E2,  and  x  and  f(x)  are 
data  constructions  of  the  same  type.  If  f(x)  =  y,  we  say  that  x  corresponds  to  y  and  vice  versa. 

Reducibility  of  Join- values,  j  =  join(bi,b2)  is  reducible  to  b\  at  a  point  po  during  execution  of  a 
space-bound  function  if  at  po 

RO.  j  does  not  have  any  join-value  descendants. 

Rl.  Gbx  and  Gb2 ,  the  DAGs  rooted  at  b\  and  b2,  are  isomorphic. 

R2.  Corresponding  primitive  values  in  b i  and  b2  and  their  descendants  are  equal,  taking  uk  =  uk. 

R3.  For  every  node  d±  of  Gbi:  if  d\  is  not  contained-in  j,  then  di  and  f(d±)  are  the  same  node. 

RO  implies  that  j  represents  exactly  two  data  structures:  b i  and  b2.  Rl  and  R2  state  that  b\  and  b2  have  the 
same  structure  and  contents.  For  example,  b \  and  b2  may  be  two  lists  of  the  same  length  and  containing  the 
same  primitive  values.  The  only  possible  difference  between  bi  and  b2  is  the  particular  heap  locations  they 
use.  No  operation  in  our  language  can  distinguish  b\  and  b2\  recall  that  we  don’t  consider  eq?.  Thus,  Rl,  R2 
and  R4  ensure  that  the  program’s  execution  is  the  same  regardless  of  whether  b i  or  b2  is  used,  except  for  the 
heap  space  used  by  b\  and  b2\  specifically,  using  b\  vs.  b2  does  not  affect  any  other  heap  allocations  or  live 
heap  usage  of  the  program.  R3  asserts  that  b\  always  contributes  at  least  as  much  to  the  live  heap  space  as 
b2.  For  example,  this  happens  if  b2  references  data  constructions  that  are  live  even  without  references  from 
b2  and  the  corresponding  data  constructions  in  bi  are  live  only  because  of  the  references  from  bi . 

As  an  example,  consider  the  following  expression. 

let  v  =  cons(uk,  nil)  in 

if  uk  then  cons  (uk,  cons  (uk,  nil))  else  cons(uk,v) 

The  abstract  heap  at  the  point  just  after  evaluating  the  conditional  is  shown  above.  Con-values  cl  through 


c4  are  numbered  according  to  the  order  of  creation.  The  result  j  of  the  conditional  satisfies  conditions  RO 
through  R4,  and  hence  may  be  reduced  to  its  left  branch. 

Theorem  1  If  j  =  join(b\,b2)  is  reducible  to  bi  at  a  point  po  during  execution  of  a  space-bound  function, 
then  it  is  safe  to  replace  all  references  to  j  with  references  to  b±,  i.e.,  in  the  presence  of  these  replacements 
the  space-bound  analysis  still  returns  an  upper  bound  on  live  heap  usage. 
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Proof:  Based  on  the  above  arguments,  it  suffices  to  show  that  b i  contributes  at  least  as  much  to  the  live 
heap  space  as  b2. 

Let  G&,  =  (14,  ,£),,,  6 i)  and  Cb2  =  (Vb2,Eb2,b2)  be  the  edge-ordered  DAGs  rooted  at  b\  and  b2,  respec¬ 
tively.  Let  Cj  be  the  set  of  nodes  contained-in  j  at  p0.  The  contribution  of  b2  to  live  is  the  amount  of  data 
contained-in  j  and  referenced  by  bi.  Observe  that  data  not  contained-in  j  is  live  regardless  of  whether  a 
replacement  of  j  with  bi  or  b2  is  performed.  The  contribution  of  j  to  l  ive  is  the  maximum  of  the  contributions 
of  bi  and  b2.  At  po, 


contrib.  of  b\  to  live  =  conCountVec{x)  (5) 

*e(vtl  p|  Cj) 

contrib.  of  b2  to  live  =  conCountVec(x )  (6) 

*e(vt2p|  CO 


First,  we  show  that  (5)  is  at  least  as  large  as  (6).  Rl  states  that  Gb1  and  Gb2  are  isomorphic.  Let  /  :  Vb1  — >  Vb2 
be  the  isomorphism.  We  show  that  at  po,  for  every  node  y  G  (142  f)  Cj),  f~1{y)  G  (Vi,,  f)  Cj).  By  definition, 
f~1{y)  G  14,  ■  We  prove  by  contradiction  that  /_1(y)  G  Cj.  Suppose  not.  Then,  according  to  R3,  f~1{y)  =  y. 
Hence,  y  £  Cj.  This  contradicts  the  assumption  that  y  G  (V) ,2  f)  Cj),  so  /_1(p)  G  Cj.  Also,  y  and  f~1{y) 
are  data  constructions  of  the  same  type.  Thus,  for  each  element  of  the  sum  in  (6),  there  is  a  corresponding 
and  equal  element  of  the  sum  in  (5),  so  (5)  is  at  least  as  large  as  (6). 

Now,  consider  a  point  p\  after  po-  If  j  is  dead  at  pi,  then  the  replacement  has  no  more  effect  on  the 
execution  of  the  space-bound  function;  the  indirect  effect  through  parts  of  j  that  were  selected  out  and  that 
are  still  live,  is  considered  below.  Suppose  j  is  live  at  pi.  Observe  that  G;,,  and  G&2  do  not  change  between 
Po  and  pi,  due  to  the  absence  of  imperative  update,  and  that  /  is  still  an  isomorphism  between  G;,,  and 
Cb2-  We  use  G£  to  refer  to  the  set  of  nodes  contained-in  node  x  at  point  p;  p  is  omitted  when  it  is  clear 
from  context.  We  show  that,  for  every  node  y  G  (14 2  fjCj1),  f~1(y)  G  GJ1.  Case  1:  f~1{y)  ^  Gj°.  Then, 
according  to  R3,  f~1(y)  =  y.  By  hypothesis,  y  G  Cj1,  so  f~1{y)  G  CJ1.  Case  2:  f~1(y)  G  Cj°.  We  prove  by 
contradiction  that  f~1{y)  G  CJ1.  /_1(p)  G  Gj°  implies  that  after  p0,  the  only  way  to  create  references  to 
,f~1(y)  from  outside  Cj  (the  DAG  rooted  at  j)  is  through  selector  applications  to  j.  This  is  thus  the  only  way 
for  /_1(p)  to  become  not  contained-in  j  after  po-  Since  selector  applications  to  j  return  join- values  whose 
branches  are  corresponding  nodes  in  14,  and  142>  such  join- values  that  reference  /_1(y)  must  also  reference 
y.  So,  if  f~1{y)  &  Cj  (as  hypothesized  for  the  proof  by  contradiction),  then  y  £  Cj.  This  contradicts  the 
assumption  y  G  (142  fjCj1).  Hence,  /_1(p)  G  Cj1. 

We  now  show  that  after  po,  the  result  of  a  selector  application  to  bi  also  contributes  at  least  as  much  to 
live  as  compared  to  the  result  of  the  same  selector  application  to  b2.  Let  the  result  of  c^41(...(c“*n(j)))  be 
j'  =  join{b\,  b'2)  where  b)  G  14,  and  b'2  G  142,  and  let  pi  be  a  point  equal  to  or  after  po  at  which  j'  is  live. 
At  pi, 


contrib.  of  b[  to  live  =  conCountVec{x) 

n cf) 

contrib.  of  b'2  to  live  =  conCountVec(x) 


(7) 

(8) 


Observe  that  the  restriction  of  /  to  any  subgraph  G  of  G&,  is  an  isomorphism  between  G  and  the  correspond¬ 
ing  subgraph  of  G&2 .  So,  the  restriction  of  /  to  G^  is  an  isomorphism  between  Gf/,  and  G^ .  We  refer  to  this 
restriction  of  /  as  just  /.  We  prove  that  for  every  y  G  (14'  ClCj1),  f~1{y)  is  in  (14'  PlGj1)-  We  consider 
two  cases.  Case  1:  f~1(y)  $.  Gj°.  Then,  /-1(p)  =  y,  because  of  R3.  y  is  in  Gj,1,  so  /_1(p)  is  in  Gj1.  Case 
2:  /_1(y)  G  Gj°.  We  prove  by  contradiction  that  f~1{y)  G  Cj,1.  Suppose  this  is  not  so.  Then,  there  exist 
references  to  f~1{y)  from  outside  G' ,  the  DAG  rooted  at  f.  At  po,  1  (?/)  G  Cj,  so  any  references  to  f~1{y) 
from  outside  G'  are  created  after  po,  implying  p\  is  after  (not  equal  to)  po-  Furthermore,  these  references 
must  be  due  to  selector  applications  to  j.  Thus,  at  pi,  references  to  /-1(y)  from  outside  Gj /  are  either  from 
j  or  from  the  results  of  selector  applications  to  j.  Since  both  j  and  results  of  selector  applications  to  j  are 
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join- values  that  reference  corresponding  nodes  in  V\n  and  Vb2,  there  exist  references  from  outside  Gy  to  y 
also,  so  y  £  Cj,1.  This  contradicts  the  assumption  y  €  {V\/,2  fj  CTy  ) ■  Hence,  f~1{y)  is  in  Cj,1.  Thus,  for  each 
element  of  the  sum  in  (8),  there  is  a  corresponding  and  equal  element  of  the  sum  in  (7),  so  (7)  is  at  least  as 
large  as  (8). 

6  Handling  Tail  Call  Optimization 

In  some  languages,  the  environment  of  the  caller  is  discarded  only  after  the  completion  of  all  function  calls 
in  its  body,  even  if  the  body  contains  a  function  call  in  tail  position.  An  expression  is  in  tail  position  if  it  is 
the  last  thing  that  the  function  does  before  returning.  A  tail  call  is  a  function  call  in  tail  position.  Tail  call 
optimization  [1]  allows  the  caller’s  environment  to  be  discarded  right  before  executing  a  tail  call.  We  have 
extended  our  analysis  to  reflect  the  effect  of  tail  call  optimization.  In  the  presence  of  this  optimization,  the 
analysis  described  in  Sections  3  and  4  yields  safe  but  perhaps  loose  bounds  on  space  usage. 

Our  extended  analysis  recognizes  function  calls  in  tail  position  and  at  the  sites  of  these  calls,  garbage 
collects  all  variables  in  the  current  scope.  In  the  original  analysis,  re’s  of  arguments  are  incremented  just 
before  function  calls  and  arguments  are  garbage  collected  on  return  from  the  calls.  In  the  extended  analysis, 
both  operations  are  performed  within  the  function  body:  increments  are  done  at  the  start  of  the  function 
and  garbage  collection  is  done  just  before  the  result  is  returned.  The  latter  is  performed  either  just  before  a 
tail  call  or  on  completion  of  the  last  non-call  expression  in  the  function  body.  This  structure  provides  for  a 
simple  transformation  as  well  as  simpler  and  more  efficient  space  and  bound  functions.  Figure  6  contains  the 
new  transformation  that  yields  space  functions.  The  right  hand  side  uses  quasi-quotes  all  expressions 
within  quotes,  except  those  of  the  form  “ Cx  [e] . . .” ,  are  to  be  taken  literally.  We  make  the  exception  to 
minimize  clutter. 

Figures  7,  8  and  9  contain  the  transformation  that  yields  bound  functions.  The  transformation  of 
conditionals  is  more  involved  than  Lb-  Consider  a  conditional  (if  uk  then  e-i  else  63)  that  is  in  tail  position. 
Both  e-i  and  e3  need  to  be  evaluated.  Suppose  both  e2  and  e3  contain  tail  calls.  During  the  evaluation 
of  e2,  all  environment  variables  u\, . . .  ,um  are  garbage  collected  just  before  the  tail  call.  But  at  the  start 
of  evaluation  of  e3,  ,um  are  live,  so  the  effects  of  the  earlier  garbage  collection  have  to  be  reversed 

before  evaluating  e^.  This  is  done  by  using  reclncrc  to  increment  re’s  of  u\, . . . ,  um  and  all  their  descendants. 
Further,  during  the  evaluation  of  e3,  U\, . . .  ,um  are  garbage  collected  just  before  the  tail  call  in  e?,.  Care 
needs  to  be  taken  to  ensure  that  references  from  the  result  r2  of  e2  to  u\, . . . ,  um  do  not  cause  the  latter  to 
be  counted  as  live  after  this  garbage  collection.  So,  before  evaluating  e3,  the  re’s  of  r2  and  all  its  descendants 
are  decremented.  Similar  issues  arise  when  only  one  of  the  two  branches  contain  tail  calls.  Consider  the 
case  when  neither  of  the  branches  contain  tail  calls.  In  each  branch,  u\, . . .  ,um  are  garbage  collected  after 
the  evaluation  of  the  last  non-call  expression  in  the  branch.  Instead  of  garbage  collecting  ui, . . . ,  urn  at  the 
end  of  e2,  and  again  incrementing  their  re’s  and  those  of  their  descendants  before  evaluating  e3,  we  simply 
evaluate  e2  and  e3  as  if  they  were  not  in  tail  position  and  garbage  collect  u±, . . .  ,um  just  before  returning 
the  result  of  the  conditional. 

tired  in  Figure  9  is  used  to  determine  whether  or  not  a  given  expression  contains  tail  calls.  More 
precisely,  given  an  expression  e  in  tail  position,  tired (e)  determines  if  all  variables  in  the  scope  of  e  are 
garbage  collected  in  [e]  itself  or  if  they  have  to  be  garbage  collected  after  [e].  This  more  precise  definion 
is  particularly  relevant  to  conditionals.  Consider  a  conditional  e  =  (if  ei  then  e2  else  e%)  in  which  one 
branch,  say  e2,  contains  a  tail  call  and  the  other  does  not.  In  e2,  u\, . . . ,  um  are  garbage  collected  before  the 
tail  call.  I2  in  Lxb  [e]  is  the  value  of  live  after  the  evaluation  of  e2-  We  need  to  garbage  collect  u\, . . . ,  um  at 
the  end  of  e3  in  order  to  ensure  that  the  value  of  live  after  the  evaluation  of  e3  ( I3  in  Cxb  [e])  corresponds  to 
the  same  “execution  point”  as  ?2-  If  neither  e2  nor  e3  contained  tail  calls  it  would  be  sufficient  to  garbage 
collect  Mi, . . . ,  urn  after  the  entire  conditional  is  evaluated.  Therefore,  tired (e)  is  true  exactly  when  either 
tlred(e 2)  or  tlred(e 3)  is  true. 
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fLx(v  =  incrc(vi); . . . ;  incrc(vn); 

Cx  [e]  true 

where  e  is  the  body  of  function  /,  i.e.,  f(v  1, . . . ,  v„ )  =  e 
Cx  [u]  tailpos 1  =  if  tailposl  then  1  gcExceptfui,  u); . . . ;  gcExcept(um,  u);  u’  else  V 
[/]  tailpos?  =  if  tailpos?  then  lgc(ui)\ . . . ;  gc(um);  V  else  T 
Cx  [c(el, . . . ,  en)]  tailpos 1  =  ‘Zzue[ic]++;  if  (P  •  Zire  >  P  ■  maxlive) 

then  for  c  €  Constructors  maxlive[ic\  :=  live[ic}; 
let  n  =  ’P*  [ej  /aZse‘,  ’ . . , ,  ‘rn  =  ’Px  [e„]  false  ‘in 
mcrc(ri); . . . ;  incrc(rn)\  ’ 
if  tailposl 

then  ‘let  r  =  new(c(ri, . . .  ,r„))  in 

‘ gcExcept(ui,  r); . . . ;  gcExcept(um ,  r);  r’ 
else  lnew(c(r\, . . .  ,rn))’ 

Cx  [p(ei, . . . ,  en)]  tailpos 1  =  if  tailpos 1  then  ‘let  r  =  p(Px  [ei]  false, . . . ,  Cx  [en]  false)  in 

5c(m);...;gc(n  m)  ? 
r’ 

else  ‘p(>Ca;  [ej  false,  ...,CX  [e„]  false)’ 

Cx  [c?(e)\  tailposl  =  ‘let  x  =  ’  Cx  [e]  false  ‘in 

let  r  =  (if  isPrim(x)  then  c?(x)  else  c? (data(x)))  in 
(if  not(isPrim(x))  and  rc{x)  =  0  then  gc{x))\’ 
if  tailposl  then  lgc(u\); . . . ;  gc(um):  r’ 
else  ‘r’ 

Cx  [c_*(e)]  tailpos 1  =  ‘let  x  =  ’  Px  [e]  false  ‘in 
let  r  =  c~l(data(x))  in 

(if  notfisPrim(x))  and  rc{ x)  =  0  then  gcExcept(x,r))\  ’ 
if  tailposl  then  1  gcExcept(ui,r); . . . ;  gcExcept(um,  r);  r’ 
else  ‘r’ 

Pa;  [if  ei  then  e 2  else  esJtazZpos?  =  ‘if  Px  [ej/cdse  then  Cx  [e2 ]  tailpos?  else  Cx  [63]  tailposl’ 
Cx  [let  v  =  ei  in  62]  tailpos 1  =  ‘let  v  =  Cx  [ei]  false  in 

incrc{v );  ’ 

if  tailpos 1  then  ‘Px  [e2]  true’ 
else  ‘let  r  =  Cx  [e2]  false  in 
gcExcept(v,r);  r’ 

Cx  [/(e  1, . . . ,  e„)]  tailpos 1  =  if  tailpos 1 

then  ‘let  n  =  Px  [ei]  false, . . .  ,rn  =  Cx  [en]  false  in 
incrc{r\)', . . . ;  incrc(rn); 
gc(ui); . . . ;  gc{u  m  )  ? 
decrc(ri); . . . ;  decrc(r„); 

/lx  rn)’ 

else  /lx  [^1]  f  disc, . . . ,  [^n]  f  disc) 


Figure  6:  Transformation  that  produces  live  heap  space  functions  fjJX  handling  tail  call  optimization,  ui, 
um  are  variables  in  the  current  scope. 


7  Experiments 

We  implemented  the  analyses  and  measured  the  results  for  several  standard  list  and  tree  processing  programs. 
Comparisons  of  results  of  space  functions  and  bound  functions  show  that  bound  functions  produce  accurate 
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fLxb(v  i,  ...,vn)  =  lincrc(v  i); . . . ;  incrc(vn ); 

£*6  [e]  true’ 

where  e  is  the  body  of  function  /,  i.e.,  f(v i, . . . ,  un)  =  e 
Cxb  M  tailposl  =  if  tailposl  then  ‘ gcExcept(u\,v); . . . ;  gcExcept(um,v)',  i>’  else  V 
Cxb  [/]  tailposl  =  if  tailposl  then  lgc(u\)', . . . ;  gc(um);  V  else  T 
£xb  [c(el, . . . ,  en)]  tailposl  =  ‘Zine[«c]++;  if  (P  •  Zire  >  P  ■  maxlive ) 

then  for  c  €  Constructors  maxlive[ic ]  :=  Pue[ic]; 
let  n  =  [ej  false1 ,  ’  ...,‘rn  =  'Cxb  [e„]  false  ‘in 
incrc(r i); . . . ;  incrc(rn );  ’ 
if  tailposl 

then  ‘let  r  =  new(c(r i, . . . ,  rn))  in 

‘ gcExcept(ui,r); . . . ;  gcExcept(um,r);  recomputeExs(r)\  r’ 
else  ‘new(c(ri, . . .  ,rn)y 

Cxb  [p(e i, . . . ,  en)]  tailposl  =  if  tailposl  then  ‘let  r  =  pu(Cxb  [ei]  false, . . . ,  Cxb  [e„]  false )  in 

<7c(ui);  •  •  • ;  9C(U  m  )  5 
r’ 

else  (Cxb  [ci] ./ alse, . . . ,  Cxb  [&n\  f  alse ) 
pu(v i, . . . ,  vn)  =  if  i>i  =  rtfc  or  •  •  •  or  vn  =  uk  then  ak  else  p(v i, . . . ,  nn) 

'Cxh  [c? (e)]  tailposl  =  ‘let  a;  =  ’  Cxb  [e]  false  ‘in 
let  r  =  clu(x )  in 

(if  not(isPrim(x))  and  rc(x )  =  0  then  90(2;));  ’ 
if  tailposl  then  lgc(ui)~ . . . ;  gc(um)',  r’ 
else  ‘r’ 

c?u(u)  =  if  v  =  uk  then  uk 

else  if  is  Join  (v)  then  if  length(branches(v))  =  1 

then  join(false,  cl  u(first  (branches  (v)))) 

else  join ( clu (first ( branches (v) ) ) ,  cl u(second ( branches (v) ) ) ) 

else  cl(data(v)) 

Cxb  [c~l(e)]  tailposl  =  ‘let  x  =  ’  Cxb  [e]  false  ‘in 

let  r  =  c~l(x)  in 

(if  not(isPrim(x))  and  rc(x)  =0  then  gcExcept(x,r))y 
if  tailposl  then  ‘  gcExcept(ui,r)', . . . ;  gcExcept(um,  r);  recompute  Exs(r )  r’ 
else  ‘  recomputeExs(r)  r’ 
c~l(v)  =  if  v  =  uk  then  uk 
else  if  isJoin(v) 

then  if  length(branches(v))  =  1  then  c~l(false) 

else  join(c~l (first (branches (v))) ,  c~l (second(branches(v)))) 
else  c~l(data(v)) 

Cxb  [let  v  =  e\  in  e 2]  tailposl  =  ‘let  v  =  Cxb  [e\]  false  in 

incrc(v );  ’ 

if  tailposl  then  ‘ Cxb  [e2]  true’ 
else  ‘let  r  =  Cxb  [e2]  false  in 

gcExcept(v,r );  recomputeExs(r );  r’ 

Cxb  [/(e  1, . . . ,  enj]  tailposl  =  if  tailposl 

then  ‘let  n  =  Cxb  [ej  false,  ...,rn=  Cxb  [e„]  false  in 
incrc(r  1); . . . ;  incrc(rn ); 
gc(ui); . . .  ■  gc(u 

m)  5 

decrc(r  1); . . . ;  decrc(rn)', 
recomputeExs(ri ); . . . ;  recomputeExs  (r„); 
fLxb(ri,  ...,rny 

else  f Lxb  (Cxb  [ci]  f  alse , . . . ,  Cxb  [en]  f  alse ) 
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Figure  7:  Transformation  that  produces  live  heap  space-bound  functions  fi,xb  handling  tail  call  optimization. 


Cxb  [if  ei  then  e2  else  63]  tailposl  = 
if  tailposl 
then  if  tired {e 2) 

then  if  tlred(e 3) 

then  ‘let  b  =  Cxb  [ei]  false  in 

if  b  =  uk  then  let  l\  =  copy(live)  in 

let  T2  =  Cxb  [e 2 ]  true  in 
let  I2  =  copy(live)  in 
recmcrc(wi); . . . ;  recincrc(um)\ 
reccfecrc(?’2); 

live  :=  Ip  let  r 3  =  Cxb  [es]  true  in 
let  I3  =  copy  (live)  in 
live  :=  max(l2,  ^3); 
recincrc(r2)',  let  r  =  join(r2,  ^3)  in 

setexs(r,  computeExs(r));  r 

else  if  b  then  Cxb  [^2]  true  else  Cxb  [es]  true’ 
else  ‘let  b  =  Cxb  [ei]  false  in 

if  b  =  uk  then  let  l\  =  copy(live)  in 

let  r3  =  Cxb  [es]  false  in 
let  I3  =  copy  (live)  in 
recdecrc(rf)\ 

live  :=  1 1;  let  r2  =  Cxb  \e2\true  in 
let  l2  =  copy  (live)  in 
live  :=  max(?2, ^3); 
recincrc(rf)\  let  r  =  join(r2,rf)  in 

setexs(r,  computeExs(r));  r 

else  if  b  then  Cx b  [e2]  true  else  Cxb  ^3]  true’’ 
else  if  tired (e3) 

then  ‘let  b  =  Cxb  [ei]  false  in 

if  b  =  uk  then  let  1 1  =  copy(live)  in 

let  r2  =  Cxb  [e2]  false  in 
let  l2  =  copy  (live)  in 
recdecrc(r2)', 

live  :=  1 1;  let  7-3  =  Cxb  [63]  true  in 
let  I2  =  copy  (live)  in 
live  :=  max(?2, ^3); 
recincrc(r2)\  let  r  =  join(r2,rf)  in 

setexs(r,  computeExs(r));  r 

else  if  b  then  Cxb  [e2]  true  else  Cxb  ^3]  true ’ 
else  ‘let  b  =  Cxb  [e-i]  false  in 

if  b  =  uk  then  let  l\  =  copy(live)  in 

let  r2  =  Cxb  fa]  false  in 
let  l2  =  copy  (live)  in 
live  :=  Ip  let  r%  =  Cxb  [63]  false  in 
let  1 3  =  copy  (live)  in 
live  :=  ma,x(l2, 13 ); 
let  r  =  join(r2,r3)  in 
setexs(r,  min(?2  —  h,l3  —  h))', 
gcExcept(ui,r ); . . . ;  gcExcept(um,  r); 
r 

else  if  b  then  Cxb  [e2]  true  else  Cxb  [1 s 3]  true ’ 
else  ‘let  b  =  Cxb  [ei]  false  in 

if  b  =  uk  then  let  l\  =  copy(live)  in 

let  r2  =  Cxb  fa]  false  in 
let  l2  =  copy(live)  ifg 
live  :=  Ip  let  r3  =  Cxb  [63]  false  in 
let  I3  =  copy  (live)  in 
live  :=  max((2,  Z3);  let  r  =  join(r2,  r3)  in 

setexs(r,mm(l2  —  h,h  —  h))',  r 


recincrc(v )  =  incrc{v)\ 

if  ( rc[v )  <  1) 
then  if  isJoin(y ) 

then  for  u  in  branches  (v)  recincrciu) 
else  for  i  =  1  ,.arity(v)  recincrc(c~l  (data(v))) 


recdecrc(v)  =  decrc{v)\ 

if  ( rc(v )  <  0) 
then  if  isJoin(v ) 

then  for  u  in  branches{v)  recdecrc(u) 
else  for  i  =  1  ..arity(v)  recdecrc(c~l(data(v))) 
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l 
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P(e  i,...,en) 
c(ei, 6n) 
c?(e) 
c-*(e) 

if  e\  then  e-i  else  e 3 

let  v  =  e\  in  e-i 
f(e  1)  •••)  era) 
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Figure  9:  Auxiliary  functions  recincrc,  recdecrc  and  tired . 


results  (i.e.,  tight  bounds)  for  all  these  examples.  The  results  are  consistent  with  the  expected  asymptotic 
space  complexities  of  the  programs.  We  measured  the  running  times  of  space  and  bound  functions  of  all 
examples.  For  most  of  the  examples,  the  bound  functions  have  the  same  asymptotic  time  complexities  as 
the  corresponding  space  functions.  For  all  examples,  a  comparison  of  the  running  times  of  bound  functions 
and  the  running  times  of  space  functions  multiplied  by  the  number  of  represented  inputs  showed  that  the 
bound  functions  are  asymptotically  faster  than  applying  the  corresponding  space  functions  to  all  represented 
inputs.  The  non-termination  issue  mentioned  in  Section  1  is  not  a  problem  for  any  of  these  examples. 

Figure  10  contains  the  results  of  live  heap  space  analysis  on  some  examples.  For  all  examples  except 
quicksort,  we  show  only  the  results  of  bound  functions  on  partially  known  inputs,  because  they  are  the  same 
as  the  results  of  the  space  functions  on  worst-case  input.  Reversal  using  append  is  the  standard  quadratic¬ 
time  version.  The  version  of  merge  sort  tested  is  the  one  that  splits  the  input  list  into  sublists  containing  the 
elements  at  odd  and  even  positions.  Dynamic  programming  algorithms  [5]  are  used  for  binomial  coefficient, 
longest  common  subsequence  and  string  edit.  Binary-tree  insertion  involves  insertion  of  an  item  into  a 
complete  binary  tree  in  which  each  node  is  a  list  containing  an  element  and  left  and  right  subtrees. 

The  partially  known  inputs  for  the  bound  functions  of  reversal  and  sorting  are  lists  of  known  lengths  n 
where  all  elements  are  uk;  those  for  longest  common  subsequence  and  string  edit  are  two  such  lists  of  equal 
length  n.  The  bound  function  for  binary-tree  insertion  inserts  uk  into  a  complete  binary  tree  of  known  height 
h  with  unknown  elements.  For  binomial  coefficient  we  use  integer  arguments,  n  and  n  —  2,  since  it  was  found 
that  for  a  given  n,  a  value  of  n  —  2  for  the  second  argument  leads  to  maximum  live  heap  usage. 

The  difference  between  the  results  of  the  space  and  bound  functions  of  quicksort  is  explained  as  follows. 
Quicksort  selects  a  pivot  element  p  from  the  given  list  Is  and  splits  the  list  into  two  lists  :  one  containing  all 
elements  of  Is  that  are  lesser  than  or  equal  to  p  and  the  other  containing  elements  that  are  greater  than  p. 
Suppose  Is  has  size  n.  Since  all  elements  in  Is  are  uk,  the  bound  function  incorrectly  concludes  that  each  of 
the  two  lists  has  worst-case  size  n  —  1.  In  reality,  the  sum  of  the  sizes  of  the  two  lists  is  n  —  1.  Further,  since 
quicksort  is  called  recursively  on  the  two  lists,  inaccuracies  at  every  recursion  level  quickly  add  up  resulting 
in  the  exponential  growth  of  the  bound  function  results. 

In  time  analysis  [22]  and  stack  space  analysis  [31],  the  bound  functions  of  quicksort  run  into  the  non- 
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termination  problem.  This  is  because  results  of  conditionals  are  summarized  into  partially  known  structures 
and  the  recursion  in  quicksort  depends  on  an  unknown  aspect  of  such  a  partially  known  structure.  In  live 
heap  analysis,  we  retain  more  information  about  the  results  of  branches  of  conditionals  by  using  join-values 
rather  than  partially  known  structures.  The  mentioned  recursion  in  quicksort  now  has  sufficient  information 
to  terminate. 
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Figure  10:  Results  of  live  heap  space  analysis,  n  is  the  input  size  except  in  the  case  of  binomial  coefficient, 
in  which  n  is  the  first  argument.  For  binary  tree  insert,  h  is  the  height  of  the  complete  binary  tree  and  n  is 
the  number  of  nodes  in  the  tree. 
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Figure  11:  Running  times  of  live  heap  space  and  live  heap  space-bound  functions.  Columns  S,  B  and  Bopt 
contain  times  of  space,  unoptimized  space-bound  and  optimized  space-bound  functions,  respectively,  n  and 
h  are  as  in  Figure  10.  m  is  milliseconds,  s  is  seconds,  M  is  minutes  and  H  is  hours.  Blank  fields  in  B  columns 
indicate  analyses  that  were  aborted  after  7  days.  Blank  fields  in  Bopt  columns  indicate  analyses  that  were 
aborted  after  2  days. 


The  results  in  Figure  10  include  the  space  used  by  top-level  arguments  since  these  arguments  are  indeed 
live  throughout  the  execution  of  the  program.  Figure  11  contains  running  times  of  live  heap  space  analysis 
on  a  sampling  of  input  sizes.  For  all  examples,  the  live  heap  space  function  has  the  same  asymptotic 
time  complexity  as  the  original  function.  The  time  complexities  of  the  live  heap  space-bound  functions  of 
reverse  using  append,  binomial  coefficient,  string  edit  and  longest  common  subsequence  are  the  same  as 
the  complexities  of  the  corresponding  original  functions.  The  time  complexities  of  the  bound  functions  of 
insertion  sort  and  selection  sort  are  a  linear  factor  more  than  those  of  the  original  functions.  The  linear 
factor  is  due  to  the  computation  involved  in  the  reduction  of  join-values.  The  running  time  of  the  bound 
function  of  merge  sort  is  more  than  polynomial  in  the  size  of  the  input.  This  is  because  the  analysis  examines 
all  ( n  +  m)\/(n\  x  ml)  ways  in  which  two  sorted  lists  of  sizes  n  and  m  may  be  merged  in  sorted  order.  The 
running  time  of  the  bound  function  of  binary  tree  insert  is  polynomial  in  the  size  of  the  input. 

The  first  optimization  in  Section  5  improves  the  asymptotic  complexities  of  reverse  using  append  and 
binomial  coefficient  by  a  linear  factor.  The  second  optimization  improves  the  asymptotic  complexities  of 
insertion  sort,  selection  sort  and  longest  common  subsequence  from  greater  than  polynomial  to  polynomial. 
These  speedups  are  shown  in  Figure  11. 

Figure  12  shows  the  results  of  the  space  analysis  that  deals  with  tail  call  optimization.  The  corresponding 
space-bound  analysis  is  not  yet  implemented.  List  reversal  is  the  standard  linear-time  version.  The  optimized 
Ackermann  function  example  [23]  is  a  systematically  derived  program  which  has  much  better  time  complexity 
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than  the  classical  version.  In  all  of  these  examples,  the  main  function  is  tail  recursive.  The  inputs  for  the 
first  three  examples  are  known  lists  of  length  n  that  lead  to  the  worst-case  space  usage.  The  inputs  for 
Ackermann’s  function  are  known  integers,  n  and  m.  The  space  complexity  of  this  program  was  worked  out 
by  hand  to  be  0(n),  but  it  is  hard  to  see  this  because  of  the  complicated  space  usage  of  the  program.  For 
each  value  of  n  for  Ackermann’s  function  in  Figure  12,  all  values  of  the  second  argument  in  the  range  [2, 10] 
produced  the  same  space  usage.  While  this  does  not  prove  0(n )  space  usage,  it  does  help  confirm  that,  for 
a  given  n,  the  space  usage  is  independent  of  m.  Computing  Ackermann’s  function  for  n  >  3  is  famously 
expensive. 
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Figure  12:  Results  of  live  heap  space  functions  handling  tail  call  optimization,  n  is  the  input  size  except  in 
the  case  of  optimized  Ackermann  function,  in  which  n  is  the  first  argument;  the  second  argument  m  ranges 
over  the  interval  [2, 10],  for  each  n. 


Comparing  the  results  in  Figure  12  with  the  space  usage  of  the  corresponding  non-tail-recursive  programs, 
shown  in  Figure  10,  we  see  that  the  tail  recursive  versions  use  less  live  heap  space.  For  example,  tail-recursive 
selection  sort  uses  only  0(n)  space,  while  non-tail-recursive  selection  sort  uses  0(n2)  space.  We  also  applied 
our  extended  analysis  to  the  examples  of  Figure  10.  The  results  were  either  the  same  or  differed  by  a  constant 
amount  of  1  or  2  cons  cells.  The  differences  occur  because  the  programs  do  contain  some  function  calls  in 
tail  position. 

Figure  13  contains  closed  forms  or  recurrence  relations  for  the  live  heap  space  used  by  the  examples.  We 
obtained  closed  forms  by  using  Matlab  to  fit  polynomials,  wherever  applicable,  to  the  data  in  Figure  10.  The 
recurrence  relation  for  merge  sort  and  the  closed  form  for  binary  tree  insert  were  obtained  by  hand  from  the 
results  in  Figure  10. 
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Figure  13:  Closed  forms  or  recurrence  relations  for  live  heap  space  used  by  example  programs,  n  and  m  are 
the  sizes  of  the  first  and  second  (if  any)  arguments.  For  binomial  coefficient,  n  and  m  are  themselves  the 
first  and  second  arguments.  For  binary  tree  insert,  h  is  the  height  of  the  complete  binary  tree. 


We  applied  our  analysis  to  a  600-line  calendar  benchmark.  The  partially  known  inputs  used  are  par¬ 
tially  known  dates.  The  analysis  takes  only  a  few  seconds  to  complete  and  yields  tight  bounds,  providing 
preliminary  evidence  for  the  scalability  of  our  method.  We  plan  to  analyze  more  benchmarks.  We  have  also 
used  the  analysis  in  teaching  programming  languages  courses. 
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8  Discussion 


Scalability.  For  large  programs  or  programs  with  sophisticated  control  structures,  the  analysis  is  efficient 
if  the  input  parameters  are  small,  but  for  larger  parameters,  efficiency  might  be  a  challenge.  However, 
from  the  results  of  bound  analysis  on  smaller  inputs,  we  may  semi-automatically  derive  closed  forms  and/or 
recurrence  relations  that  describe  the  program’s  space  usage,  by  fitting  a  given  functional  form  to  the  analysis 
results.  Also,  when  closed  forms  or  recurrence  relations  are  known,  we  may  use  the  results  of  the  analysis  to 
determine  exact  coefficients.  The  closed  forms  or  recurrence  relations  may  then  be  used  to  determine  space 
bounds  for  large  inputs. 

Termination.  The  space  function  terminates  iff  the  original  program  terminates.  The  bound  function 
might  not  terminate,  even  when  the  original  program  does  if  the  recursive  structure  of  the  original  program 
directly  or  indirectly  depends  on  unknown  parts  of  a  partially  known  input  structure.  For  example,  if  the 
given  partially  known  input  structure  is  uk,  then  the  bound  function  for  any  recursive  program  does  not 
terminate;  if  such  a  bound  function  counts  new  space,  then  the  original  program  might  indeed  take  an 
unbounded  amount  of  space.  Indirect  dependency  on  unknown  data  can  be  caused  by  an  imprecise  join 
operation.  Making  the  join  operation  more  precise  eliminates  this  source  of  non-termination. 

Although  there  are  methods  to  deal  with  non-termination,  incorporating  such  methods  in  our  analysis 
could  result  in  loose  bounds  on  space  usage,  even  for  programs  for  which  non-termination  is  not  a  problem. 
Further,  non-termination  is  not  a  problem  in  any  of  the  examples  we  analyzed. 

Inputs  to  Bound  Functions.  To  analyze  space  usage  with  respect  to  some  property  of  the  input,  we 
need  to  formulate  sets  of  partially  known  inputs  that  represent  all  actual  inputs  with  that  characteristic, 
e.g.,  all  lists  with  length  n,  all  binary  trees  of  height  h  or  all  binary  trees  with  n  nodes.  As  an  example, 
{(uk,  (uk,  nil ,  nil),  nil),  (uk,  nil,  (uk,  nil,  nil)),  (uk,  (uk,  nil,  nil),  (uk,  nil,  nil))}  represents  all  binary  trees  of 
height  1,  each  node  being  a  list  of  the  element  and  left  and  right  subtrees.  Often,  formulating  such  sets  of 
partially  known  inputs  is  straightforward  but  tedious  for  the  user  to  do  by  hand.  However,  it  is  easy  to  write 
programs  that  generate  such  sets  of  partially  known  inputs. 

Imperative  Update  and  Higher-Order  Functions.  The  ideas  in  this  paper  may  be  combined  with 
reference-counting  garbage  collection  extended  to  handle  cycles  [3]  or  with  other  garbage  collection  algo¬ 
rithms,  such  as  mark  and  sweep,  to  obtain  a  live  heap  space  analysis  for  imperative  languages.  They  may 
also  be  combined  with  techniques  for  analysis  of  higher-order  functions  [30,  11]. 

Correctness  A  complete  proof  of  correctness  of  the  analysis  is  yet  to  be  formulated.  Included  below  is 
one  part  of  the  proof. 

Lemma  1  For  a  join-value  j,  it  is  safe  to  ignore  a  smaller  recomputed  value  of  exs(j). 

Proof:  Suppose  exs(j)  is  computed  at  point  p  and  then  recomputed  at  point  q  and  exs(j)  at  q  is  less  than 
exs(j)  at  p.  We  prove  that  this  may  happen  only  if  after  p,  a  selector  is  applied  to  j,  thus  creating  a  new 
join-value  j'  which  references  parts  of  j.  The  referenced  parts  of  j  are  no  longer  contained-in  j  and  when 
exs(j)  is  recomputed  at  q,  it  is  smaller  than  the  earlier  value  of  exs(j).  It  is  indeed  safe  to  ignore  any 
decrease  in  exs(j)  caused  by  selector  application  because  selection  does  not  alter  the  fact  that  only  one  of 
the  data  constructions  represented  by  j  is  live.  We  keep  only  one  such  data  construction  live  by  recording  the 
excess  in  the  exs  held  of  j  and  setting  exs(j'),  for  any  j'  selected  from  j,  to  V0.  This  ensures  that  excesses 
are  treated  as  such  exactly  once.  From  the  definition  of  exs  (2),  exs(j)  decreases  only  if  one  or  more  of  the 
following  occurs  : 

1.  total(j)  decreases  :  this  happens  if  some  con-value  c  €  Cj  is  not  contained-in  j  at  q.  Since  c  is 
contained-in  j  at  p,  there  is  no  way  to  create  a  new  reference  to  c  other  than  by  applying  a  selector  to 
j.  As  we  explained  earlier,  we  may  safely  ignore  decreases  of  exs(j)  caused  by  selection  from  j. 
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2.  max(j)  increases  :  this  can  happen  only  if  some  con-value  c  ^  Cj  is  contained-in  j  at  q  and  c  is  in  a 
maximum-sized  join-path  of  j  at  q.  By  definition,  the  size  of  c  is  also  included  in  total(j)  at  q.  Thus, 
c  being  contained-in  j  at  q  (and  not  p)  causes  atleast  as  much  increase  in  total  (j)  as  in  max(j).  Thus, 
an  increase  in  max(j)  doesn’t  actually  contribute  to  a  decrease  in  exs(j). 

3.  sub(j)  increases  :  sub(j)  increases  only  if  (a)  for  some  join-value  j'  £  Cj  fj  CJ,  exs(j')  increases  (after 
p)  or  (b)  there  exists  a  join- value  j '  €  CJ/CJ  such  that  exs(j')  >  V0. 

(a)  j'  £  C?  implies  that  from  point  p  onwards,  it  is  impossible  to  obtain  a  direct  reference  to  j' ,  i.e. , 
no  expression  may  evaluate  to  j'.  Expressions  may  evaluate  to  results  that  contain  j'  but  even 
then  every  path  from  such  a  result  r  to  j'  must  contain  a  join-value,  either  j  itself  or  a  join- value 
obtained  by  selection  from  j.  The  analysis  calls  recomputeExs  only  on  results  of  expressions. 
Thus,  after  point  p,  recomputeExs  is  never  called  directly  on  j'.  It  may  be  called  on  a  result  r 
that  satisfies  the  constraints  just  described.  By  definition,  recomputeExs  (r)  recomputes  the  exs 
fields  of  any  join-value  ancestors  of  j'  but  does  not  recompute  exs(j').  Hence,  for  all  /  £  Cj, 
exs(j')  at  p'  =  exs(j')  at  p,  for  any  point  p'  >  p.  (3a)  does  not  cause  any  increase  in  sub(j). 

(b)  Let  j'  be  a  join- value  in  C'j  \  Cj  such  that  exs(j')  at  q  >  Vo-  exs(j')  includes  only  the  sizes  of  con- 
values  contained-in  j' .  Since  j'  is  contained-in  j,  any  con-value  c  £  C[J  is  also  contained-in  j  at  q. 
Therefore,  total(j)  at  q  also  includes  the  size  of  c.  Thus,  any  increase  in  sub(j)  is  accompanied  by 
atleast  an  equal  increase  in  total(j),  effectively  leaving  exs(j)  unchanged  or  larger  than  before. 


9  Related  Work 

There  has  been  a  large  amount  of  work  on  analyzing  program  cost  or  resource  complexities,  but  the  majority 
of  it  is  on  time  analysis,  e.g.,  [21,  28,  30,  22].  Stack  space  and  heap  allocation  analysis  [31]  is  similar  to 
time  analysis  [22] .  Analysis  of  live  heap  space  is  different  because  it  involves  explicit  analysis  of  the  graph 
structure  of  the  data. 

Most  of  the  work  related  to  analysis  of  space  is  on  analysis  of  cache  behavior,  e.g.,  [32,  10],  much  of  which 
is  at  a  lower  language  level,  for  compiler  generated  code,  while  our  analyses  are  at  source  level  and  can  serve 
many  purposes,  as  discussed  in  Section  1.  Live  heap  space  analysis  is  also  a  first  step  towards  analyzing 
cache  behavior  in  the  presence  of  garbage  collection. 

Persson’s  work  on  live  memory  analysis  [26]  for  an  object-oriented  language  requires  programmers  to 
give  annotations,  including  specific  numbers  as  bounds  for  the  size  of  recursive  data  structures.  His  work  is 
preliminary:  the  presentation  is  informal,  with  a  few  formulas  summarizing  sizes  of  data  in  bytes  based  on 
the  annotations,  and  only  one  example,  summing  a  list,  is  given.  Our  analysis  is  able  to  compute  bounds 
based  on  input  size  only,  without  program  annotations. 

Unlike  static  reference  counting  used  in  analysis  for  compile-time  garbage  collection  [18,  16],  our  analysis 
uses  a  reference  counting  method  similar  to  that  in  run-time  garbage  collection.  While  the  former  keeps 
track  of  pointers  to  memory  cells  that  will  be  used  later  in  the  execution,  the  latter  maintains  pointers 
reachable  from  the  stack  at  the  current  point  in  execution.  Our  analysis  could  be  modified  so  that  decrc(v) 
is  called  when  a  parameter  or  let- variable  won’t  be  used  again  (instead  of  waiting  until  v  goes  out  of  scope) . 
Our  current  analysis  corresponds  to  the  garbage  collection  behavior  in,  e.g.,  JVMs  from  Sun,  IBM,  and 
Transvirtual.  Inoue  and  others  [15]  analyze  functional  programs  to  detect  run-time  garbage  conservatively 
at  compile-time.  Their  result  is  an  approximation  without  any  information  about  the  input.  Also,  they  do 
not  compute  the  size  of  live  space. 

Several  type  systems  [14,  13,  6]  have  been  proposed  for  reasoning  about  space  and  time  bounds,  and 
some  of  them  include  implementations  of  type  checkers  [14,  6].  They  require  programmers  to  annotate  their 
programs  with  cost  functions  as  types.  Furthermore,  some  programs  must  be  rewritten  to  have  feasible  types 
[14,  13]. 

Chin  and  Khoo  [4]  propose  a  method  for  calculating  sized  types  by  inferring  constraints  on  size  and  then 
simplifying  the  constraints  using  Omega  [27].  Their  analysis  results  do  not  correspond  to  live  heap  space  in 
general.  Further,  Omega  can  only  reason  about  constraints  expressed  as  linear  functions. 
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To  summarize,  this  work  is  a  first  attempt  to  analyze  live  heap  space  automatically  and  accurately  using 
source-level  program  analysis  and  transformations.  The  analysis  can  be  modified  to  reflect  the  effect  of 
optimization  of  tail  recursion.  The  ideas  in  this  paper  may  be  combined  with  reference-counting  garbage 
collection  extended  to  handle  cycles  [3]  or  with  other  garbage  collection  algorithms,  such  as  mark  and  sweep, 
to  obtain  a  live  heap  space  analysis  for  imperative  languages.  They  may  also  be  combined  with  techniques 
for  analysis  of  higher-order  functions  [30,  22] . 
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